Crowd-sourcing NLG Data: Pictures Elicit Better Data
نویسندگان
چکیده
Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit data. We show that pictorial MRs result in better NL data being collected than logicbased MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.
منابع مشابه
Automatic Corpus Extension for Data-driven Natural Language Generation
As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a cost...
متن کاملUsing Expertise for Crowd-Sourcing
In this paper, we examine whether the use of expertise ratings can help crowd-sourcing systems. We show, using simulations, that a crowd-sourcing system based in social navigation works better when users’ expertise levels are taken into account.
متن کاملCrowd-Sourced Iterative Annotation for Narrative Summarization Corpora
We present an iterative annotation process for producing aligned, parallel corpora of abstractive and extractive summaries for narrative. Our approach uses a combination of trained annotators and crowd-sourcing, allowing us to elicit human-generated summaries and alignments quickly and at low cost. We use crowd-sourcing to annotate aligned phrases with the text-to-text generation techniques nee...
متن کاملA Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
Recent spoken dialog systems have been able to recognize freely spoken user input in restricted domains thanks to statistical methods in the automatic speech recognition. These methods require a high number of natural language utterances to train the speech recognition engine and to assess the quality of the system. Since human speech offers many variants associated with a single intent, a high...
متن کاملMapping Community Engagement with Urban Crowd-Sourcing
Cities are highly dynamic entities, with urban elements such as businesses, cultural and social Points-ofInterests (POIs), housing, transportation and the like, continuously changing. In order to maintain accurate spatial information in these settings, crowd-sourcing models of data collection, such as in OpenStreetMap (OSM), have come under investigation. Like many crowd-sourcing platforms (e.g...
متن کامل